Picture for He He

He He

PaintBench: Deterministic Evaluation of Precise Visual Editing

Add code
May 29, 2026
Viaarxiv icon

Estimating Tail Risks in Language Model Output Distributions

Add code
Apr 24, 2026
Viaarxiv icon

Factuality on Demand: Controlling the Factuality-Informativeness Trade-off in Text Generation

Add code
Jan 31, 2026
Viaarxiv icon

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

Add code
Oct 01, 2025
Figure 1 for Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Figure 2 for Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Figure 3 for Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Figure 4 for Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Viaarxiv icon

Jailbreak Strength and Model Similarity Predict Transferability

Add code
Jun 15, 2025
Figure 1 for Jailbreak Strength and Model Similarity Predict Transferability
Figure 2 for Jailbreak Strength and Model Similarity Predict Transferability
Viaarxiv icon

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

Add code
Jun 12, 2025
Figure 1 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Figure 2 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Figure 3 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Figure 4 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Viaarxiv icon

Unsupervised Elicitation of Language Models

Add code
Jun 11, 2025
Figure 1 for Unsupervised Elicitation of Language Models
Figure 2 for Unsupervised Elicitation of Language Models
Figure 3 for Unsupervised Elicitation of Language Models
Figure 4 for Unsupervised Elicitation of Language Models
Viaarxiv icon

Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models

Add code
Apr 13, 2025
Viaarxiv icon

Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification

Add code
Apr 07, 2025
Figure 1 for Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Figure 2 for Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Figure 3 for Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Figure 4 for Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Viaarxiv icon

Transformers Struggle to Learn to Search

Add code
Dec 06, 2024
Figure 1 for Transformers Struggle to Learn to Search
Figure 2 for Transformers Struggle to Learn to Search
Figure 3 for Transformers Struggle to Learn to Search
Figure 4 for Transformers Struggle to Learn to Search
Viaarxiv icon